Text: now in 2D! A framework for lexical expansion with contextual similarity
نویسندگان
چکیده
A new metaphor of two-dimensional text for data-driven semantic modeling of natural language is proposed, which provides an entirely new angle on the representation of text: not only syntagmatic relations are annotated in the text, but also paradigmatic relations are made explicit by generating lexical expansions. We operationalize dis-tributional similarity in a general framework for large corpora, and describe a new method to generate similar terms in context. Our evaluation shows that distributional similarity is able to produce high-quality lexical resources in an unsupervised and knowledge-free way, and that our highly scalable similarity measure yields better scores in a WordNet-based evaluation than previous measures for very large corpora. Evaluating on a lexical substitution task, we find that our contextualization method improves over a non-contextualized base-line across all parts of speech, and we show how the metaphor can be applied successfully to part-of-speech tagging. A number of ways to extend and improve the contextualization method within our framework are discussed. As opposed to comparable approaches, our framework defines a model of lexical expansions in context that can generate the expansions as opposed to ranking a given list, and thus does not require existing lexical-semantic resources.
منابع مشابه
Text Categorization from Category Name via Lexical Reference
Requiring only category names as user input is a highly attractive, yet hardly explored, setting for text categorization. Earlier bootstrapping results relied on similarity in LSA space, which captures rather coarse contextual similarity. We suggest improving this scheme by identifying concrete references to the category name’s meaning, obtaining a special variant of lexical expansion.
متن کاملThe Impact of Contextual Clue Selection on Inference
Linguistic information can be conveyed in the form of speech and written text, but it is the content of the message that is ultimately essential for higher-level processes in language comprehension, such as making inferences and associations between text information and knowledge about the world. Linguistically, inference is the shovel that allows receivers to dig meaning out from the text with...
متن کاملA corpus based approach to find similar keywords for search engine marketing
Automatic thesaurus generation is used by search engines for query expansion. The same concept is used by search engine marketing companies to suggest keyword terms to their clients to improve the client’s ratings for different search engines. This paper presents and evaluates a corpus based method to find similar terms. The corpus is generated by scraping websites in different categories. A fe...
متن کاملEmotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملSecond-Order Cohesion
Similarity in contextual behavior between words is considered a source of “lexical cohesion,” which is otherwise hard to measure or quantify. Such contextual similarity is used by an implementation for text segmentation, the VecTile system, which uses precompiled vector representations of words to produce similarity curves over texts. The performance of this system is shown to improve over that...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Language Modelling
دوره 1 شماره
صفحات -
تاریخ انتشار 2013